================================================================================
  DATA CODEBOOK: countrycode.xlsx
================================================================================

  Paper:   Disentangling the Three Facets of Mass Ideological Polarization:
           A Network Approach across 78 Societies
  Authors: Yufan Guo, Yilang Peng, Tian Yang
  Journal: Public Opinion Quarterly

  File:    countrycode.xlsx
  Sheets:  Sheet1 (country crosswalk), Sheet2 (country-level covariates)

  Description:
  This file provides (1) the mapping between country names used in the
  analysis and their ISO country codes, and (2) country-level covariate
  data used in the regression analyses reported in the main text (Table 3,
  Figure 5) and Supplementary Information.

  Note on coding: Standard ISO 3166-1 alpha-2 and alpha-3 codes are used
  throughout, with one exception: Northern Ireland is assigned the
  non-standard codes "NIR" (2-letter) and "NIR" (3-letter) rather than
  the UK national code "GB"/"GBR", because it is treated as a distinct
  survey unit in the EVS/WVS data.

  Note on missing values: Missing values are represented as blank cells in
  the Excel file, which R reads as NA. No numeric sentinel codes (e.g.,
  -99, 999) are used.

================================================================================
  SHEET 1 — COUNTRY CROSSWALK
================================================================================

  Dimensions: 92 rows × 2 columns
  Unit of observation: Country (one row per country)
  Purpose: Maps country names used in the analysis to ISO 2-letter codes.
           Used in replication.R to label survey respondents by country.

  ── VARIABLES ────────────────────────────────────────────────────────────────

  country
    Type   : Character (string)
    Description: Country name as used in the analysis. Multi-word names
                 use underscores in place of spaces (e.g., "Great_Britain",
                 "Bosnia_Herzegovina", "United_States").
    Values : 92 unique values; no missing values.
    Notes  : All 92 countries match the 78 analytic societies plus the 14
             societies excluded from analysis (those lacking the ideology
             self-placement item or missing a required survey item).

  countrycode
    Type   : Character (string)
    Description: ISO 3166-1 alpha-2 country code (2-letter). Used as the
                 join key between Sheet1 and the EVS/WVS microdata
                 (cntry_AN variable in the raw data after recoding).
    Values : 92 unique values; no missing values.
    Notes  : Northern Ireland uses the non-standard code "NIR" rather than
             "GB" to preserve it as a distinct survey unit. All other
             entries follow the ISO 3166-1 alpha-2 standard.



================================================================================
  SHEET 2 — COUNTRY-LEVEL COVARIATES
================================================================================

  Dimensions: 125 rows × 10 columns
  Unit of observation: Country-survey wave (one row per country per EVS/WVS
                       survey wave conducted in that country).
  Purpose: Provides country-level predictor variables matched to the year
           of each country's EVS/WVS fieldwork. In replication.R, for
           countries with multiple rows (different waves), covariate values
           are averaged across waves (Section 9) to produce a single
           country-level score for the main regression analysis.


  ── VARIABLES ────────────────────────────────────────────────────────────────

  country
    Type        : Character (string)
    Description : Country name (same coding as Sheet1).
    Values      : 92 unique country names; no missing values.

  countrycode
    Type        : Character (string)
    Description : ISO 3166-1 alpha-2 country code (same as Sheet1).
    Values      : 92 unique codes; no missing values.

  country_text_id
    Type        : Character (string)
    Description : ISO 3166-1 alpha-3 country code (3-letter). Used as
                  the merge key to join Sheet2 with V-Dem v14 covariates
                  in replication.R (Section 9).
    Values      : 92 unique codes; no missing values.
    Notes       : Northern Ireland is coded "NIR" (non-standard).
                  All other entries follow ISO 3166-1 alpha-3.

  year
    Type        : Integer
    Description : Calendar year of EVS/WVS fieldwork in that country.
                  Determines which annual covariate values (GDP, HDI, etc.)
                  are attached to each country-wave observation.
    Range       : 2017–2023
    Missing     : None (0 missing values)

  GDP
    Type        : Numeric (continuous)
    Description : GDP per capita in current US dollars (World Bank World
                  Development Indicators). Values correspond to the survey
                  year for each country.
    Missing     : None (0 missing values)
    Use in analysis: Log-transformed before use in regression models
                     (Section G in SI). Variable is labeled "gdp_log"
                     after transformation.
    Source      : World Bank World Development Indicators.
                  https://data.worldbank.org/indicator/NY.GDP.PCAP.CD

  polstability
    Type        : Numeric (continuous)
    Description : World Bank Worldwide Governance Indicators: Political
                  Stability and Absence of Violence/Terrorism. Reflects
                  perceptions of the likelihood of political instability
                  and/or politically motivated violence.
    Missing     : None (0 missing values)
    Notes       : Negative values are substantively meaningful (greater
                  instability). The scale runs approximately -2.5 to +2.5;
                  they are NOT missing-data codes.
    Source      : World Bank Worldwide Governance Indicators.
                  https://info.worldbank.org/governance/wgi/

  population
    Type        : Numeric (integer, stored as float)
    Description : Total mid-year population. Values correspond to the
                  survey year for each country.
    Missing     : None (0 missing values)
    Use in analysis: Log-transformed before use in regression models.
                     Variable is labeled "population" after transformation
                     in replication.R.
    Source      : World Bank World Development Indicators.
                  https://data.worldbank.org/indicator/SP.POP.TOTL

  HDI
    Type        : Numeric (continuous)
    Description : United Nations Development Programme Human Development
                  Index. A composite index measuring average achievement
                  in health, education, and standard of living.
    Missing     : 5 (represented as blank cells / NA)
    Notes       : HDI values were matched to the closest available year
                  when the exact survey year was not published.
                  A quadratic term (HDI²) is included in all regression
                  models to capture the hypothesized curvilinear
                  relationship with polarization.
    Source      : UNDP Human Development Reports.
                  https://hdr.undp.org/data-center/human-development-index

  pressfreedom
    Type        : Numeric (continuous)
    Description : Reporters Without Borders (RSF) World Press Freedom Index.
                  Higher values indicate greater press freedom.
    Missing     : 2 (represented as blank cells / NA)
    Source      : Reporters Without Borders World Press Freedom Index.
                  https://rsf.org/en/index

  ethnicfract
    Type        : Numeric (continuous)
    Description : Ethnic fractionalization index. Measures the probability
                  that two randomly selected individuals in a country
                  belong to different ethnic groups. Ranges from 0
                  (fully homogeneous) to 1 (fully heterogeneous).
    Missing     : 16 (represented as blank cells / NA)
    Notes       : This variable is time-invariant: countries with multiple
                  rows carry the same ethnicfract value across all years.
                  Missing values reflect countries not covered by the
                  original Alesina et al. (2003) dataset.
    Source      : Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S.,
                  & Wacziarg, R. (2003). Fractionalization. Journal of
                  Economic Growth, 8(2), 155–194.

  ── NOTE ON ADDITIONAL COVARIATES USED IN REPLICATION.R ─────────────────────

  The regression analyses in replication.R use three additional
  country-level variables that are NOT stored in countrycode.xlsx. They
  are obtained by merging Sheet2 with the V-Dem Country-Year dataset
  (V-Dem-CY-Full+Others-v14.rds) in Section 9 of the script:

  polparrel  (Political Parallelism)
    Source variable : v2mebias in V-Dem v14
    Description     : Degree to which the media system is politically
                      partisan or biased toward the governing party/parties.
                      Higher values indicate less media bias (more neutral).
    Source          : V-Dem Project, v14.


  V-Dem validation variables (SI Section E only):
    v2xps_party  — Party System Institutionalization index
    v2pscomprg   — Party competition across regions
    v2psplats    — Distinct party platforms
    v2smpolsoc   — Social polarization (reversed)



